Event Group Similarity Analysis using Latent Dirichlet Allocation

نویسنده

  • Katrin Erk
چکیده

An event, or a predicate and its arguments, is a semantic representation in the task of information extraction (IE) that provides little information beyond the sentence level. This paper describes event groups, sets of related events that encode semantic information across multiple sentences. We define an information retrieval task that uses our proposed event group representation as queries to create a more semantically oriented search. This task requires finding a method to measure event group similarity, which this paper addresses. Our approach approximates the similarity between pairs of event groups using distances between their topic distributions generated by a Latent Dirichlet Allocation (LDA) model. We use the ACE 2005 dataset to manually construct a gold standard corpus with annotated event groups and ground truth pairwise similarity scores. We evaluate the LDA similarity scores against the gold standard and show an average Spearman’s rank correlation coefficient of 0.120.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Similarity Measures Based on Latent Dirichlet Allocation

We present in this paper the results of our investigation on semantic similarity measures at wordand sentence-level based on two fully-automated approaches to deriving meaning from large corpora: Latent Dirichlet Allocation, a probabilistic approach, and Latent Semantic Analysis, an algebraic approach. The focus is on similarity measures based on Latent Dirichlet Allocation, due to its novelty ...

متن کامل

Legal Documents Clustering using Latent Dirichlet Allocation

At present due to the availability of large amount of legal judgments in the digital form creates opportunities and challenges for both the legal community and for information technology researchers. This development needs assistance in organizing, analyzing, retrieving and presenting this content in a helpful and distributed manner. We propose an approach to cluster legal judgments based on th...

متن کامل

Comparing Attitudes to Climate Change in the Media using sentiment analysis based on Latent Dirichlet Allocation

News media typically present biased accounts of news stories, and different publications present different angles on the same event. In this research, we investigate how different publications differ in their approach to stories about climate change, by examining the sentiment and topics presented. To understand these attitudes, we find sentiment targets by combining Latent Dirichlet Allocation...

متن کامل

SEMILAR: A Semantic Similarity Toolkit for Assessing Students' Natural Language Inputs

We present in this demo SEMILAR, a SEMantic similarity toolkit. SEMILAR includes offers in one software environment several broad categories of semantic similarity methods: vectorial methods including Latent Semantic Analysis, probabilistic methods such as Latent Dirichlet Allocation, greedy lexical matching methods, optimal lexico-syntactic matching methods based on word-to-word similarities a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017